Secrets of rlhf in large language models part i: Ppo

Published in Instruction Workshop @ NeurIPS 2023, 2023